Picture for Dianqi Li

Dianqi Li

Sandboxed Coding Agents are Competitive Omni-modal Task Solvers

Add code
May 30, 2026
Viaarxiv icon

Useful Memories Become Faulty When Continuously Updated by LLMs

Add code
May 13, 2026
Viaarxiv icon

Superminds Test: Actively Evaluating Collective Intelligence of Agent Society via Probing Agents

Add code
Apr 24, 2026
Viaarxiv icon

DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents

Add code
Jun 13, 2025
Viaarxiv icon

What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding

Add code
Jun 08, 2025
Viaarxiv icon

LangBridge: Interpreting Image as a Combination of Language Embeddings

Add code
Mar 26, 2025
Figure 1 for LangBridge: Interpreting Image as a Combination of Language Embeddings
Figure 2 for LangBridge: Interpreting Image as a Combination of Language Embeddings
Figure 3 for LangBridge: Interpreting Image as a Combination of Language Embeddings
Figure 4 for LangBridge: Interpreting Image as a Combination of Language Embeddings
Viaarxiv icon

ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning

Add code
Mar 25, 2025
Figure 1 for ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning
Figure 2 for ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning
Figure 3 for ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning
Figure 4 for ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning
Viaarxiv icon

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

Add code
Dec 05, 2024
Figure 1 for Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
Figure 2 for Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
Figure 3 for Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
Figure 4 for Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
Viaarxiv icon

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

Add code
Oct 07, 2024
Viaarxiv icon

Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts

Add code
Sep 24, 2024
Figure 1 for Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
Figure 2 for Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
Figure 3 for Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
Figure 4 for Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
Viaarxiv icon